We introduce TeSS (Text Similarity Comparison using Sentence Encoder), a framework for zero-shot classification where the assigned label is determined by the embedding similarity between the input text and each candidate label prompt. We leverage representations from sentence encoders optimized to locate semantically similar samples closer to each other in embedding space during pre-training. The label prompt embeddings serve as prototypes of their corresponding class clusters. Furthermore, to compensate for the potentially poorly descriptive labels in their original format, we retrieve semantically similar sentences from external corpora and additionally use them with the original label prompt (TeSS-R). TeSS outperforms strong baselines on various closed-set and open-set classification datasets under zero-shot setting, with further gains when combined with label prompt diversification through retrieval. These results are robustly attained to verbalizer variations, an ancillary benefit of using a bi-encoder. Altogether, our method serves as a reliable baseline for zero-shot classification and a simple interface to assess the quality of sentence encoders.
translated by 谷歌翻译
Neural radiance field (NeRF) attracts attention as a promising approach to reconstructing the 3D scene. As NeRF emerges, subsequent studies have been conducted to model dynamic scenes, which include motions or topological changes. However, most of them use an additional deformation network, slowing down the training and rendering speed. Tensorial radiance field (TensoRF) recently shows its potential for fast, high-quality reconstruction of static scenes with compact model size. In this paper, we present D-TensoRF, a tensorial radiance field for dynamic scenes, enabling novel view synthesis at a specific time. We consider the radiance field of a dynamic scene as a 5D tensor. The 5D tensor represents a 4D grid in which each axis corresponds to X, Y, Z, and time and has 1D multi-channel features per element. Similar to TensoRF, we decompose the grid either into rank-one vector components (CP decomposition) or low-rank matrix components (newly proposed MM decomposition). We also use smoothing regularization to reflect the relationship between features at different times (temporal dependency). We conduct extensive evaluations to analyze our models. We show that D-TensoRF with CP decomposition and MM decomposition both have short training times and significantly low memory footprints with quantitatively and qualitatively competitive rendering results in comparison to the state-of-the-art methods in 3D dynamic scene modeling.
translated by 谷歌翻译
联合学习(FL)是一个活跃的研究领域。采用FL的最合适区域之一是医疗领域,必须尊重患者隐私。但是,先前的研究并未完全考虑谁最有可能在医疗领域使用FL。渴望采用FL的不是医院,而是想要开发具有真实患者记录的机器学习模型的服务提供商。此外,服务提供商希望以最低成本的可能性来最大程度地提高模型的性能。在这项工作中,我们提出了FL方法的经验基准,考虑了三个现实世界数据集的性能和货币成本:电子健康记录,皮肤癌图像和心电图数据集。我们还建议使用近端正则化的联合学习,除了局部归一化(FEDPXN),该学习使用FEDPROX和FEDBN的简单组合优于所有其他FL算法,而仅消耗比最高效率的方法稍大一些。
translated by 谷歌翻译
当前的大多数TTS数据集是单个话语的集合,在样式和元数据方面几乎没有对话方面。在本文中,我们介绍了DailyTalk,这是一种专为文本到语音设计的高质量对话语音数据集。我们从开放域对话数据集Dabordialog中取样,修改和记录了2,541个对话,这些对话足以表示每个对话的上下文。在数据构建步骤中,我们维护了最初在DailyDialog中注释的属性分布,以支持DailyTalk中的各种对话。除了数据集之外,我们将先前的工作扩展为我们的基线,在该基线中,非自动回忆TTS的条件是对话框中的历史信息。我们收集元数据,以便TTS模型可以学习历史对话信息,这是产生上下文感知语音的关键。从基线实验结果中,我们显示每日talk可用于训练神经文本到语音模型,我们的基线可以代表上下文信息。 DailyTalk数据集和基线代码可自由使用CC-BY-SA 4.0许可证。
translated by 谷歌翻译
在图像分类中,“ debiasing”旨在训练分类器,以免对数据集偏差,数据样本的外围属性与目标类别之间的强相关性。例如,即使数据集中的青蛙类主要由具有沼泽背景的青蛙图像组成(即,偏见与一致的样本),也应该能够在海滩上正确地对青蛙进行正确分类(即,偏见的样品, )。最近的辩论方法通常使用两个组件进行偏见,一个有偏见的模型$ f_b $和一个模型$ f_d $。 $ f_b $经过培训,可以专注于偏见的样本(即过度适合偏见),而$ f_d $主要通过专注于$ f_b $未能学习的样品,主要接受了偏见的样本培训,导致$ f_d $。不太容易受到数据集偏差的影响。虽然最先进的偏见技术旨在更好地培训$ f_d $,但我们专注于培训$ f_b $,这是迄今为止被忽视的组件。我们的实证分析表明,从$ f_b $的培训设置中删除偏见的样本对于改善$ f_d $的偏见性能很重要。这是由于以下事实:偏置冲突样品会干扰$ f_b $的偏见,因为这些样本不包括偏差属性。为此,我们提出了一种简单而有效的数据样本选择方法,该方法可以删除偏置冲突的样本,以构建一个偏置放大数据集用于培训$ f_b $。我们的数据示例选择方法可以直接应用于现有的基于重新加权的偏差方法,从而获得一致的性能提升并实现合成和现实世界数据集的最新性能。
translated by 谷歌翻译
在许多现实世界中的高级应用程序中,解释人工智能(AI)模型的决策(AI)模型越来越重要。数以百计的论文提出了新功能归因方法,在其工作中讨论或利用这些工具。然而,尽管人类是目标最终用户,但大多数归因方法仅在代理自动评估指标上进行评估(Zhang等人,2018年; Zhou等人,2016年; Petsiuk等人,2018年)。在本文中,我们进行了首个用户研究,以衡量归因地图的有效性,以帮助人类进行成像网分类和斯坦福犬细粒分类,以及图像是自然或对抗性的(即包含对抗性扰动)。总体而言,特征归因比显示最近的训练集示例的人更有效。在一项艰巨的狗分类的艰巨任务中,向人类提供归因地图无济于事,而是与仅AI相比会损害人类团队的性能。重要的是,我们发现自动归因地图评估措施与实际人类AI团队的绩效较差。我们的发现鼓励社区严格测试其在下游人类应用应用程序上的方法,并重新考虑现有的评估指标。
translated by 谷歌翻译
解释通常被认为是黑匣子的深神经网络的行为,尤其是当它们在人类生活的各个方面被采用时。借助可解释的机器学习的优势(可解释的ML),本文提出了一种名为灾难性遗忘的解剖器(或CFD)的新颖工具,以解释在持续学习环境中的灾难性遗忘。我们还根据我们的工具的观测值介绍了一种称为关键冻结的新方法。关于重新系统的实验表达了如何发生灾难性遗忘,尤其是表明该著名网络的哪些组成部分正在忘记。我们的新持续学习算法通过大量余量击败了各种最近的技术,证明了调查的能力。批判性冻结不仅攻击灾难性的遗忘,而且揭示了解释性。
translated by 谷歌翻译
The 3D-aware image synthesis focuses on conserving spatial consistency besides generating high-resolution images with fine details. Recently, Neural Radiance Field (NeRF) has been introduced for synthesizing novel views with low computational cost and superior performance. While several works investigate a generative NeRF and show remarkable achievement, they cannot handle conditional and continuous feature manipulation in the generation procedure. In this work, we introduce a novel model, called Class-Continuous Conditional Generative NeRF ($\text{C}^{3}$G-NeRF), which can synthesize conditionally manipulated photorealistic 3D-consistent images by projecting conditional features to the generator and the discriminator. The proposed $\text{C}^{3}$G-NeRF is evaluated with three image datasets, AFHQ, CelebA, and Cars. As a result, our model shows strong 3D-consistency with fine details and smooth interpolation in conditional feature manipulation. For instance, $\text{C}^{3}$G-NeRF exhibits a Fr\'echet Inception Distance (FID) of 7.64 in 3D-aware face image synthesis with a $\text{128}^{2}$ resolution. Additionally, we provide FIDs of generated 3D-aware images of each class of the datasets as it is possible to synthesize class-conditional images with $\text{C}^{3}$G-NeRF.
translated by 谷歌翻译
In both terrestrial and marine ecology, physical tagging is a frequently used method to study population dynamics and behavior. However, such tagging techniques are increasingly being replaced by individual re-identification using image analysis. This paper introduces a contrastive learning-based model for identifying individuals. The model uses the first parts of the Inception v3 network, supported by a projection head, and we use contrastive learning to find similar or dissimilar image pairs from a collection of uniform photographs. We apply this technique for corkwing wrasse, Symphodus melops, an ecologically and commercially important fish species. Photos are taken during repeated catches of the same individuals from a wild population, where the intervals between individual sightings might range from a few days to several years. Our model achieves a one-shot accuracy of 0.35, a 5-shot accuracy of 0.56, and a 100-shot accuracy of 0.88, on our dataset.
translated by 谷歌翻译
Feature selection helps reduce data acquisition costs in ML, but the standard approach is to train models with static feature subsets. Here, we consider the dynamic feature selection (DFS) problem where a model sequentially queries features based on the presently available information. DFS is often addressed with reinforcement learning (RL), but we explore a simpler approach of greedily selecting features based on their conditional mutual information. This method is theoretically appealing but requires oracle access to the data distribution, so we develop a learning approach based on amortized optimization. The proposed method is shown to recover the greedy policy when trained to optimality and outperforms numerous existing feature selection methods in our experiments, thus validating it as a simple but powerful approach for this problem.
translated by 谷歌翻译